172

Applications in Computer Vision

FIGURE 6.12

Convergence Faster-RCNN with ResNet-18 backbone (left) and SSD with VGG-16 backbone

(right) based on different binarizations training on VOC trainval2007 and trainval2012.

  

  

  

  

   

 

   

FIGURE 6.13

The input images and the saliency maps follow [79]. The images are randomly selected from

VOC test2007. Each row includes: (a) input images, saliency maps of (b) Faster-RCNN

with ResNet-101 backbone (Res101), (c) Faster-RCNN with ResNet-18 backbone (Res18),

(d) 1-bit Faster-RCNN with ResNet-18 backbone (BiRes18), respectively.

that knowledge distillation (KD) methods such as [235] are effective for distilling real-valued

Faster-RCNNs, only when their teacher model and their student counterpart share small

information discrepancy on proposals, as shown in Fig. 6.13 (b) and (c). This phenomenon

does not happen for 1-bit Faster-RCNN, as shown in Fig. 6.13 (b) and (d). This might

explain why existing KD methods are less effective in 1-bit detectors. A statistic on the

COCO and PASCAL VOC datasets in Fig. 6.14 shows that the discrepancy between the